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ABSTRACT 


Behavioral records collected through course assessments, peer 
assignments, and programming assignments in Massive Open 
Online Courses (MOOCs) provide multiple views about a 
student’s study style. Study behavior is correlated with 
whether or not the student can get a certificate or drop out 
from a course. It is of predominant importance to identify 
the particular behavioral patterns and establish an accurate 
predictive model for the learning results, so that tutors can 
give well-focused assistance and guidance on specific stu- 
dents. However, the behavioral records of individuals are 
usually very sparse; behavioral records between individuals 
are inconsistent in time and skewed in contents. These re- 
main big challenges for the state-of-the-art methods. In this 
paper, we engage the concept of subgroup as a trade-off to 
overcome the sparsity of individual behavioral records and 
inconsistency between individuals. We employ the frame- 
work of Exceptional Model Mining (EMM) to dis- 
cover exceptional student behavior. Various model classes 
of EMM are applied on dropout rate analysis, correlation 
analysis between length of learning behavior sequence and 
course grades, and passing state prediction analysis. Quali- 
tative and quantitative experimental results on real MOOCs 
datasets show that our method can discover significantly in- 
teresting learning behavioral patterns of students. 
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1. INTRODUCTION 


Massive Open Online Courses (MOOCs) make it possible for 
educators to analyze learning behavior of students in mul- 
tiple views. In contrast to traditional classes, which only 
have limited learning behavioral records, MOOC platforms 
such as Coursera, edX and Udacity provide huge amounts 
of learning behavioral records. These platforms collect very 


Wouter Duivesteijn 
Eindhoven University of 
Technology 
w.duivesteijn@tue.nl 


Mykola Pechenizkiy 
Eindhoven University of 
Technology 
m.pechenizkiy@tue.nl 


detailed course information and students’ learning behavior 
such as course assessments, peer assignments, programming 
assignments, forum discussions and feedback [19], which can 
reflect the knowledge and skill achievements and the study 
performance of students. Modeling students’ learning be- 
havior and trying to discover interesting behavioral patterns 
are non-trivial. Most recent research is focused on how to 
predict the learning results based on the learning behavior 
model. It can help the tutors to design the courses and give 
specific guidance and assistance to specific students. How- 
ever, due to the complexity of the behavioral records, there 
are still several challenges to be overcome: 


Individual sparsity. Even when many students are en- 
rolled in a course, the duration of their involvement varies 
substantially. Figure la displays a histogram of assessment 
question frequencies, which shows an obvious Power-Law 
distribution [2]. Only a few students participate in hun- 
dreds of assessment questions. Most of the students have 
activity length less than 20 records, which is very sparse. 
This makes evolutionary activity sequence based user mod- 
eling methods [16, 17] ineffective. 


Activity inconsistency. Beyond the distribution in ac- 
tivity length of assessment questions, students’ learning be- 
havior in forum discussion, click stream and peer review are 
also shown to follow a Power-Law distribution. In Table 
4, we can see that among the 18 courses on Coursera, en- 
rolled students, grades and students who passed the course 
are highly diverse. This inconsistency makes the data very 
imbalanced, which results in difficulties for Matrix factor- 
ization based modeling methods [24]. These methods might 
merge infrequent behavior with common behavior. 


Content heterogeneity. Behavior diversity is not only 
shown in activity length and course status, but also shown in 
informative contents. There are 7 types of assessments and 
12 types of questions in the courses, such as video, summa- 
tive, checkbox and multiple checkbox. Proportions of these 
assessments and questions are skewed in different courses. 
On the other hand, students also have varying participa- 
tion records on these contents. In Figure 2, it is shown that 
distributions of students are obviously different in specific 
demographic categories. It is a big challenge for modeling 
methods to handle these heterogeneous contents for tasks 
like dropout prediction or passing state prediction. 
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(a) Histogram of the number of assess- 
ment questions in which students partic- 
ipate. 


(b) Performance per assessment type. 
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‘meq’ represents multiple checkbox ques- 
tions. 


Figure 1: Heterogeneity and inconsistency of student behavior. 
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Figure 2: Student distributions across various demographic categories. 


To overcome these challenges, we propose to employ Ex- 
ceptional Model Mining (EMM) [4] for exceptional learning 
behavior analysis. Instead of looking for anomalies or out- 
liers of individuals, we look for exceptional behavior on the 
subgroup level [7], which can provide interpretable descrip- 
tions such as ‘Students: Country = US, Region = Manhat- 
tan, Join dates > 365 (days)’ having exceptional learning 
behaviors that are predominantly different from those in 
the whole dataset. We employ EMM to discover interest- 
ing learning behavioral patterns in subgroups. We establish 
various model classes for specific learning behaviors, such as 
discovering correlation between length of behavior sequence 
and course grades, finding out subgroups with exceptional 
dropout ratio, and looking for specific subsets where the clas- 
sifier does not perform well. Experimental results on a real 
dataset illustrate the type of meaningful learning behavioral 
patterns EMM can discover in MOOCs. This can help us 
build an improved behavior model in the future research. In 
summary, our main contributions are: 


1. We employ Exceptional Model Mining (EMM) to learn- 
ing behavior analysis in MOOCs, which can help us to 
overcome the sparsity, inconsistency and heterogeneity 
in the behavioral records. 


. We employ several EMM model classes for different 
tasks to discover exceptional learning behaviors on the 
subgroup level. Our results show very interesting learn- 
ing behavioral patterns, which can help the tutors con- 
duct specific guidance and assistance to the students. 


RELATED WORK 
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Local Pattern Mining (LPM) [6, 14] is a subfield of data 
mining, concerned with discovering subsets of the dataset 
at hand where something interesting is going on. Typically, 
a restriction is imposed on what kind of subsets we are inter- 
ested in: only those subsets that can be formulated within 
a predefined description language are allowed. A canonical 
choice for this language is conjunctions of conditions on at- 
tributes of the dataset. Hence, if the records in our dataset 
concern people, then LPM finds results of the form: 


Age > 45 A Smoker = yes 


~ interesting 


This ensures that the results we find with an LPM method 
are relatively easy to interpret for a domain expert: the 
subsets will be expressed in terms of quantities with which 
the expert is familiar. We call a subset that can be expressed 
in such a way a subgroup. 


Different LPM methods give a different answer to the ques- 
tion what exactly constitutes “where something interesting 
is going on”. The most famous form of LPM is Frequent 
Itemset Mining (FIM) [1], where interestingness is equiva- 
lent to occurring unusually frequently: things that happen 
often are interesting. Hence, FIM finds results of the form: 


Age > 45 A Smoker = yes 


~ 


(high frequency) 


The methods we are mainly concerned with in this paper, 
however, seek a more complex concept on the right-hand 
side of this arrow. The task of Subgroup Discovery (SD) 
[9, 23, 7] typically singles out one binary attribute of the 
dataset as the target: subgroups are deemed interesting if 
this one target has an unusual distribution, as compared to 
its distribution on the entire dataset. In our example, if the 
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target column describes whether the person develops lung 
cancer or not, SD finds results of the form: 


Smoker = yes ~+ lung cancer = yes 


Age < 25 ~ lung cancer = no 


These subgroups make intuitive sense in terms of our knowl- 
edge of the domain. Smokers have a higher-than-usual in- 
cidence of lung cancer. People under the age of 25 often 
have not yet had the chance to develop lung cancer, so the 
incidence in this group will be lower. When the connec- 
tion between subgroup and unusual target distribution is 
not immediately intuitively clear, the result of SD is a new 
hypothesis to be investigated by the domain experts. 


2.1 Exceptional Model Mining 

Exceptional Model Mining (EMM) [12, 4] can be seen as an 
extension of SD: instead of a single target, EMM typically se- 
lects multiple target columns. A specific kind of interaction 
between these targets is captured by the definition of a model 
class. EMM finds a subgroup to be interesting when this in- 
teraction is exceptional, as captured by the definition of a 
quality measure. For instance, when two numerical columns 
are selected as the targets, we can consider Pearson’s corre- 
lation p as the model class. Quality measures for this model 
class could be p itself (to find subgroups on which the target 
correlation is unusually high), —p (to find subgroups with 
unusually strongly negative target correlation), |p| (to find 
subgroup with unusually strong positive or negative target 
correlation), or —|p| (to find subgroups with unusually weak 
target correlation). Hence, the model class fixes the type of 
target interaction in which we are interested, and the qual- 
ity measure fixes what, within this type of interaction, we 
find interesting. Several model classes have been defined 
and explored; for instance, Bayesian networks [5], and re- 
gression [3]. Popular quality measure for SD/EMM include 
WRaAcc [10], z-score [13], and KL divergence [11]. 


2.2 Learning Behavior Modeling 

Learning behavior modeling for students in MOOCs is gen- 
erally aimed at predictive analytics such as dropout predic- 
tion, passing state prediction, and grades prediction. For 
instance, latent factors and state machines are employed to 
model the hidden study state of students for a predictive 
task [18, 16, 21]. Khajah et al. [8] integrate Latent factor 
and knowledge tracing with a hierarchical Bayesian model, 
which can consider the study skill for prediction tasks. Re- 
current neural network and LSTM have been used to model 
study trajectories for the learning results prediction [15, 22]. 
Most of these existing methods focus on modeling individual 
behavior but do not consider the sparsity, inconsistency and 
heterogeneity of learning behavior data. Our methods focus 
on discovering exceptional learning behaviors on the sub- 
group level, which provide interpretable information about 
where the predictive model does not perform well. This al- 
lows us to establish an improved model for prediction tasks 
for both normal and exceptional behavioral patterns. 


3. PRELIMINARIES 

We assume a dataset Q: a bag of N records r € 2 of the 
form r = (a1,...@x,l1,...,lm), where k and m are posi- 
tive integers. We call ai,...,a% the descriptive attributes 
or descriptors of r, and l,...,lm the target attributes or 


targets of r. The descriptive attributes are taken from an 
unrestricted domain A. Mathematically, we define descrip- 
tions as functions D: A — {0,1}. A description D covers a 
record r‘ if and only if D(a{,--- ,a,) = 1. 


DEFINITION 1. A subgroup corresponding to a description 
D is the bag of records Gp CQ that D covers, i.e.: 


Gp= {r' € Q|D(ai,..., a) — iy 


This merely formalizes the standard LPM conditions: we 
seek subgroups that are defined in terms of conditions on 
the descriptors, hence our results are interpretable. Those 
conditions select a subset of the records of the dataset: those 
records that satisfy all conditions. These subgroups must be 
evaluated, which is done by the quality measure: 


DEFINITION 2. A quality measure is a function yp :D— 
R that assigns a numeric value to a description D. Occa- 
sionally, we use y(G) to refer to the quality of the induced 
subgroup: p(Gp) = y(D). 


Typically, a quality measure assesses the subgroup at hand 
based on some interaction on the target columns. Hence, a 
description and a quality measure interact through different 
partitions of the dataset columns; the former focuses on the 
descriptors, the latter focuses on the targets, and they are 
linked through the subgroup. 


Since subgroups select subsets of the dataset at hand, and 
many such subsets exist, we need to employ a search strategy 
to ensure that we find good results in a reasonable amount 
of time. To do so, we employ the beam search algorithm as 
outlined in [4, Algorithm 1]. This algorithm holds the mid- 
dle ground between a pure greedy search algorithm, which 
is likely to quickly end up in a local optimum, and an ex- 
haustive search, which is likely to require too much time for 
providing the global optimum. Beam search builds up candi- 
date subgroups in a level-wise manner, by imposing a single 
condition on a single attribute at each step of the search. 
In subsequent steps, promising candidates are refined, by 
conjoining to these candidates all possible additional single 
conditions on a single attribute, and evaluating the results. 
A purely greedy approach would, at each step, refine the 
single most promising candidate. By contrast, beam search 
refines a fixed number w (the beam width) of most promising 
candidates at each step. The larger the choice of w, the more 
likely we are to escape local optima, and the longer the algo- 
rithm will take. An additional parameter of beam search is 
the number d (the search depth), which sets an upper limit 
to the number of steps in the search process. Hence, by de- 
sign, any subgroup resulting from a beam search procedure 
must be defined as a conjunction of at most d conditions 
on single attributes. The larger the choice of d, the more 
expressive the results are; the smaller the choice of d, the 
easier the results are to interpret. 


4. EXCEPTIONAL LEARNING BEHAVIOR 
ANALYSIS 


Our dataset originates from the learners involved in the EIT 
Digital MOOCs at Coursera. EIT Digital, as part of the 
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Figure 3: Dropout ratio of students by country. 


Table 1: Exceptional dropout rate in subgroups. 
Results show subgroups with highly exceptional 
dropout rate. The overall dropout rate is 0.4286. 


D YWwRAcc | dropout | |Gp| 


Table 2: Exceptional correlation analysis between 
length of behavior sequence and course grades. The 
overall correlation coefficient p is 0.7406. 


D Yscd le IGp| 
Country = LT, Join Date > 701, | 0.9999 0.9782 11 
Browser language != et-EE 

Region = 6 0.9994 -0.1272 10 
Region = QUE 0.9992 -0.0788 11 
Country = NP 0.9985 0.9630 11 
Browser language = es-MX 0.9973 0.1203 7 


Table 3: Exceptional classifier behavior for course 

passing state prediction. Results indicate that the 

classifier cannot work well on these exceptional sub- 

groups. 
D 


Country = OM, Was Group Sponsored | 0.0338 0.0 
!= True, Was Finaid Grant != True 


42 


Region = MOW, Gender != male, Join | 0.0336 0.0 
Date <= 1011, Join Date > 389 


57 


Country = KR, Gender != female, Profile | 0.0330 0.7812 
language != ko 


32 


Country = KR, Educational status != | 0.0313 0.7742 
MASTERS DEGREE, Gender != female, 
Was Group Sponsored != True 


34 


Country = KR, Was Group Sponsored != | 0.0304 0.7222 
True 


European Institute for Innovation and Technology, aims to 
drive Europe’s digital transformation, also for education. 
The EIT Digital academy is focused on mobility and en- 
trepreneurship and is at the forefront of integrating edu- 
cation, research, and business. The MOOCs in the online 
programme, have been developed by the partner universi- 
ties involved in the EIT Digital Master School in Embedded 
Systems, in a best of breeds approach. 


Together, the MOOCs form the EIT Digital online pro- 
gramme “Internet of Things through Embedded Systems”. 
The online programme aims to build the reputation of EIT 
Digital, the partner universities, and the involved teachers. 
It also helps to renew pedagogy through scalable education 
technologies and data driven education. Learning analyt- 
ics are at the core of this feedback mechanism. The online 
programme is comparable to an edX’s micromaster and sim- 
ilarly offers an online equivalent of a 25 ECTS first semester; 
the online programme offers learners to study at their own 
pace, any time, any place. Moreover, they first can have 
a try before they commit themselves to the whole master 
programme. Once selected and admitted on campus, the 
learners can finish the double degree master programme of 
EIT Digital Master School in Embedded Systems. 


Figure 2 displays the distributions of students across vari- 
ous demographic categories. In order to catch the inherent 
imbalance, we use demographic columns as the left hand at- 
tributes, to formulate subgroup descriptions. In the data 
preprocessing process, we convert the join dates, which rep- 
resents how long a student has registered in Coursera, from 
the format of ‘Datetime’ to the integer days. The follow- 
ing three sections illustrate what kind of discoveries can be 
made by wielding various tools from the EMM toolbox. 
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pe IGo| 
Country = OM, Profile language = en-US, | 0.5051 32 
Browser language != en-US, Educational status 
!= BACHELOR DEGREE 
Country = OM, Profile language != en-US 0.4058 22 
Region = MA, Gender = female, Educational sta- | 0.3489 24 
tus=COLLEGE NO DEGREE 
Country = OM, Met Payment Condition != True | 0.3464 31 
Join Date <= 390, Region != MA 0.3193 28 


4.1 Exceptional Dropout Rate Analysis 

In this section, our task is to find out the subgroups which 
have significantly different dropout rate compared with the 
whole dataset. For the purposes of this paper, we define 
a dropout student to be a student who has participated in 
at least one assessment question, but has not obtained an 
overall course grade. In Figure 3, we present the highest- 
frequency countries, and the dropout rate of students in 
those countries. From the figure we can see that both fre- 
quency and dropout rate vary a lot. The high dropout rate 
is usually seen as a defect of MOOCs. If we were to discover 
what kinds of students have exceptional dropout rates, then 
that would allow us to direct specific guidance to those stu- 
dents that most require it. Traditional partition and clus- 
tering methods are not qualified for this task, because they 
cannot provide interpretable results about the subsets of stu- 
dents and quantitative information about how different the 
subsets of students are from the whole dataset. To address 
this problem, we propose to engage subgroups as a partition 
for the whole dataset, and look for subgroups that have most 
exceptional dropout rate comparing with the whole dataset, 
employing Weighted Relative Accuracy (WRAcc) [20]: 


|Gp| ( Sp <2) 

N |Gp| N 
Here, |Gp| represents the number of records covered by sub- 
group description D, Sp represents the number of dropout 
students in subgroup Gp, Se represents the total number of 
dropout students in the whole dataset, and N represents the 


number of students who join this course and participated in 
at least one assessment question. 


PwRAcc = 


The beam search algorithm as described in [4, Algorithm 1] 
is parameterized with beam width 20 and search depth 4. 
The overall dropout rate is 0.4286. In Table 1, we presents 
the top-5 subgroups with most exceptional dropout rate. 
The subgroup with description “D: Region = MOW, Gender 
!= male, Join Date between 389 and 1011” has a dropout rate 
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Figure 4: Exceptional correlations in subgroups. 


of zero: all students in that subgroup complete the course. 
On the other hand, the subgroup with description “D: Coun- 
try = KR, Gender != female and Profile language != ko”, 
has an elevated dropout rate of 0.7812: most of these stu- 
dents drop out. Based on these results, we can conclude that 
Korean males who have set their profile language to some- 
thing other than Korean, are in need of more attention. This 
may be a group of students who are foreigners in Korea, or 
Koreans who are studying in a language which is non-native 
to them. By identifying such at-risk groups, educators can 
more effectively channel their remedial activities. 


4.2 Exceptional Correlation Analysis 

Generally, more active students can be expected to obtain 
higher grades. To investigate this phenomenon, we look into 
the relation between the activity length (denoted by gq) of 
students and the overall grades (denoted by g) in a course. 
We engage the correlation model class for EMM to realize 
this task. In this model class, we can estimate the correlation 
coefficient by calculating the sample correlation as follows: 


a. (Gg = 9(9' = 35) 
VE@- 9d ' - 9)? 


= (1) 


Here, 7 represents the sample correlation, q’, g' represent the 
activity length and course grade of each student, and q,g 
represent their average values over the dataset. Equation 
(1) is the Fisher z transformation, 2’ in the lower equation 
represents the z’ computation on the subgroup and 2° on 


its complement, and |G'p| represents the number of records 
covered by subgroup with description D. Under the null 
hypothesis that the correlation between q and g is the same 
inside and outside of the subgroup, z* follows a standard 
normal distribution. Hence, the value for z* implies a p- 
value under this null hypothesis. Leman et al. [12] propose 
to use one minus this p-value as quality measure Yscaq: the 
higher this value is, the more certain we are that the null 
hypothesis is false and hence exceptional correlations are 
observed. 


Using this quality measure, we conduct the experiment with 
beam width 20 and search depth 3. In Table 2 and Figure 4, 
we list the top-5 subgroups with exceptional quality score, 
coefficients, and coverage. We can see that some students 
gain extremely high grades with longer behavior sequence 
(cf. Figure 4b, 4e); some students have longer behavior se- 
quence length but lower grades (cf. Figure 4c, 4d); and for 
some subgroups, the length of behavior sequences has no ob- 
vious correlation with the grades (cf. Figure 4f). We can de- 
duce that the efforts that some students spend in the study 
are not directly correlated with their learning results. 


4.3 Exceptional classifier behavior analysis 
Students’ behavioral records in MOOCs are sparse, incon- 
sistent and heterogeneous. Learning behavior could be very 
different between different students. This imbalance increases 
the difficulty of training a classifier that can perform well on 
each part of the dataset. This makes it difficult to train a 
model that is qualified for tasks like dropout prediction and 
course passing state prediction. 


In this section, we investigate whether learning behavior can 
predict whether or not a student can pass the course. At 
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Table 4: Course statistics. 


course_name course_level | complete number | avg_grades | course_enroll_num | max_grades | min_grades | pass_number 
Marketing I 1141 0.105 4609 | 1 0.006 52 
Design Thinking I 369 0.167 3483 | 0.972 0.01 22 
loT A 8 0.098 241 | 0.1 0.087 0 
System Validation (2) I 63 0.412 1010 | 1 0.05 12 
Smart IoT B 905 0.216 6035 | 1 0.004 100 
Computer Architecture I 913 0.510 7652 | 1 0.025 299 
System Validation (4) A 17 0.597 985 | 1 0.071 9 
Quantitative Model (1) I 429 0.395 1807 | 1 0.007 49 
System Validation (3) A 45 0.418 764 | 1 0.057 11 
Quantitative Model (2) A 979 0.339 4975 | 1 0.016 52 
System Validation I 601 0.376 2605 | 1 0.04 124 
Technology I 258 0.232 3930 | 1 0.002 34 
Embedded Systems I 549 0.291 3737 | 1 0.02 67 
Software Architecture A 2710 0.299 10487 | 1 0.012 331 
Real-Time Systems I 3615 0.203 15123 | 1 0.006 389 
IoT Devices I 430 0.318 6609 | 1 0.008 85 
Embedded Hardware I 3943 0.160 19592 | 1 0.02 128 
Open Innovation I 480 0.137 3150 | 0.969 0.008 24 


the same time, we investigate in which parts of the dataset 
the classifier does not work well. In Section 4.1 and 4.2, 
we have presented that EMM can effectively discover ex- 
ceptional learning behavioral patterns in MOOCs. We will 
continue using the EMM framework to find where our pre- 
dictive model does not work well in the dataset. Considering 
the activities of students in assessments, forum discussions 
and peer assignments, we formulate the passing state pre- 
diction problem as follows: 


f:X'3Y' 


Our aim is to train a classifier f that can automatically map 
XxX’ to Y*, where X° is a 8-tuple (s',m’,o',c',b’,e', h', p’) 
feature vector representing the length of assessment and 
question sequence (s‘), number of assessment types (m*), 
number of question types (0), number of correctly answered 
questions (c), number of asked, answered and liked ques- 
tions in the forum (b*, e*, h’), and peer review score (p’), and 
where Y is the label of passing state: {0,1}. We normalize 
the features into 0 to 1 as the input values. 


At first, the classifier is trained on the whole dataset. This 
model will classify some students correctly and some stu- 
dents wrongly; in any case we find a value of predicted la- 
bels Y. These two binary values Y and Y will agree and 
disagree on some students, and that interaction can be used 
to capture the quality of the classifier predictions in a single 
number. We use the fl score to capture this: 


Precision - Recall 


(2) 


However, we can perform the exact same computation for 
a subset of the vectors Y and Y, for instance the subset 
induced by a subgroup. Thus, we employ yr as a quality 
measure for EMM. 


ee Precision + Recall 


We conduct the experiment by setting the search depth to 4 
and beam width to 10. We engage an SVM classifier as the 
predictive model’, which has 0.85 as fl score on the whole 


tone may plug in one’s preferred classifier; SVM selection is 
merely meant as an illustration, not an endorsement. 
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dataset. In Table 3 we list the top-5 subgroups with excep- 
tional behavior. We can see that even though the classifier 
performs well on the whole dataset, in some subgroups it 
does not. Particularly for the students described by descrip- 
tions like “D: Region = MA, Gender = female, Educational 
status=COLLEGE NO DEGREE”, the classifier performs 
poorly on the prediction task at hand: the support vector 
machine has trouble predicting the study success of Mas- 
sachusets women without a college degree. Hence, this group 
requires a more sophisticated classifier. 


5. CONCLUSIONS 


In this paper, we employ Exceptional Model Mining (EMM) 
for exceptional learning behavior analysis in MOOCs. Rather 
than predicting the success of individual students, which is 
difficult due to the inherent sparsity, inconsistency, and het- 
erogeneity of the data, EMM specializes in identifying co- 
herent groups that behave differently from the norm. Since 
the subgroups resulting from EMM come with an easily in- 
terpretable definition, Exceptional Model Mining allows ed- 
ucators to more effectively channel their remedial activities. 


We employ three EMM model classes for different tasks of 
learning behavior analysis. Experimental results on a real 
Coursera dataset show that for some students, the dropout 
rate is very different from the whole dataset, the learning 
efforts are not always correlated with course grades, and a 
classifier that performs very well on the whole dataset has 
trouble on some subpopulations of the data. In future work, 
we will make use of these discovered exceptional behavioral 
patterns to establish an improved model, which can model 
both normal and exceptional learning behaviors for the stu- 
dents in MOOCs. We plan to develop a modeling method 
that can perform well on each part of the dataset, including 
the exceptional ones. 
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